AITopics | search data

Collaborating Authors

search data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Compressing Search with Language Models

Mulc, Thomas, Steele, Jennifer L.

arXiv.org Artificial IntelligenceJun-24-2024

Millions of people turn to Google Search each day for information on things as diverse as new cars or flu symptoms. The terms that they enter contain valuable information on their daily intent and activities, but the information in these search terms has been difficult to fully leverage. User-defined categorical filters have been the most common way to shrink the dimensionality of search data to a tractable size for analysis and modeling. In this paper we present a new approach to reducing the dimensionality of search data while retaining much of the information in the individual terms without user-defined rules. Our contributions are two-fold: 1) we introduce SLaM Compression, a way to quantify search terms using pre-trained language models and create a representation of search data that has low dimensionality, is memory efficient, and effectively acts as a summary of search, and 2) we present CoSMo, a Constrained Search Model for estimating real world events using only search data. We demonstrate the efficacy of our contributions by estimating with high accuracy U.S. automobile sales and U.S. flu rates using only Google Search data.

dimensionality, search data, search term, (16 more...)

arXiv.org Artificial Intelligence

2407.00085

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Vermont (0.04)
North America > United States > California > Santa Clara County > Sunnyvale (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)

Add feedback

Detecting Elevated Air Pollution Levels by Monitoring Web Search Queries: Deep Learning-Based Time Series Forecasting

Lin, Chen, Yousefi, Safoora, Kahoro, Elvis, Karisani, Payam, Liang, Donghai, Sarnat, Jeremy, Agichtein, Eugene

arXiv.org Artificial IntelligenceNov-9-2022

Real-time air pollution monitoring is a valuable tool for public health and environmental surveillance. In recent years, there has been a dramatic increase in air pollution forecasting and monitoring research using artificial neural networks (ANNs). Most of the prior work relied on modeling pollutant concentrations collected from ground-based monitors and meteorological data for long-term forecasting of outdoor ozone, oxides of nitrogen, and PM2.5. Given that traditional, highly sophisticated air quality monitors are expensive and are not universally available, these models cannot adequately serve those not living near pollutant monitoring sites. Furthermore, because prior models were built on physical measurement data collected from sensors, they may not be suitable for predicting public health effects experienced from pollution exposure. This study aims to develop and validate models to nowcast the observed pollution levels using Web search data, which is publicly available in near real-time from major search engines. We developed novel machine learning-based models using both traditional supervised classification methods and state-of-the-art deep learning methods to detect elevated air pollution levels at the US city level, by using generally available meteorological data and aggregate Web-based search volume data derived from Google Trends. We validated the performance of these methods by predicting three critical air pollutants (ozone (O3), nitrogen dioxide (NO2), and fine particulate matter (PM2.5)), across ten major U.S. metropolitan statistical areas (MSAs) in 2017 and 2018.

artificial intelligence, information management, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.2196/23422

2211.05267

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
North America > United States > California > Los Angeles County > Claremont (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Law > Environmental Law (1.00)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
(4 more...)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multiwave COVID-19 Prediction via Social Awareness-Based Graph Neural Networks using Mobility and Web Search Data

Xue, J., Yabe, T., Tsubouchi, K., Ma, J., Ukkusuri, S. V.

arXiv.org Artificial IntelligenceOct-22-2021

Recurring outbreaks of COVID-19 have posed enduring effects on global society, which calls for a predictor of pandemic waves using various data with early availability. Existing prediction models that forecast the first outbreak wave using mobility data may not be applicable to the multiwave prediction, because the evidence in the USA and Japan has shown that mobility patterns across different waves exhibit varying relationships with fluctuations in infection cases. Therefore, to predict the multiwave pandemic, we propose a Social Awareness-Based Graph Neural Network (SAB-GNN) that considers the decay of symptom-related web search frequency to capture the changes in public awareness across multiple waves. SAB-GNN combines GNN and LSTM to model the complex relationships among urban districts, inter-district mobility patterns, web search history, and future COVID-19 infections. We train our model to predict future pandemic outbreaks in the Tokyo area using its mobility and web search data from April 2020 to May 2021 across four pandemic waves collected by _ANONYMOUS_COMPANY_ under strict privacy protection rules. Results show our model outperforms other baselines including ST-GNN and MPNN+LSTM. Though our model is not computationally expensive (only 3 layers and 10 hidden neurons), the proposed model enables public agencies to anticipate and prepare for future pandemic outbreaks.

infection case, prediction, urban district, (10 more...)

arXiv.org Artificial Intelligence

2110.11584

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.50)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
(6 more...)

Genre: Research Report > New Finding (0.88)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Proper Use of Google Trends in Forecasting Models

Medeiros, Marcelo C., Pires, Henrique F.

arXiv.org Machine LearningApr-10-2021

It is widely known that Google Trends have become one of the most popular free tools used by forecasters both in academics and in the private and public sectors. There are many papers, from several different fields, concluding that Google Trends improve forecasts' accuracy. However, what seems to be widely unknown, is that each sample of Google search data is different from the other, even if you set the same search term, data and location. This means that it is possible to find arbitrary conclusions merely by chance. This paper aims to show why and when it can become a problem and how to overcome this obstacle.

different sample, gdp growth, google trend, (12 more...)

arXiv.org Machine Learning

2104.03065

Country:

North America > United States (0.30)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
Oceania > New Zealand (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.96)
Banking & Finance (0.72)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Information Management > Search (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

Zero-Shot Heterogeneous Transfer Learning from Recommender Systems to Cold-Start Search Retrieval

Wu, Tao, Chio, Ellie Ka-In, Cheng, Heng-Tze, Du, Yu, Rendle, Steffen, Kuzmin, Dima, Agarwal, Ritesh, Zhang, Li, Anderson, John, Singh, Sarvjeet, Chandra, Tushar, Chi, Ed H., Li, Wen, Kumar, Ankit, Ma, Xiang, Soares, Alex, Jindal, Nitin, Cao, Pei

arXiv.org Machine LearningAug-18-2020

Many recent advances in neural information retrieval models, which predict top-K items given a query, learn directly from a large training set of (query, item) pairs. However, they are often insufficient when there are many previously unseen (query, item) combinations, often referred to as the cold start problem. Furthermore, the search system can be biased towards items that are frequently shown to a query previously, also known as the 'rich get richer' (a.k.a. feedback loop) problem. In light of these problems, we observed that most online content platforms have both a search and a recommender system that, while having heterogeneous input spaces, can be connected through their common output item space and a shared semantic representation. In this paper, we propose a new Zero-Shot Heterogeneous Transfer Learning framework that transfers learned knowledge from the recommender system component to improve the search component of a content platform. First, it learns representations of items and their natural-language features by predicting (item, item) correlation graphs derived from the recommender system as an auxiliary task. Then, the learned representations are transferred to solve the target search retrieval task, performing query-to-item prediction without having seen any (query, item) pairs in training. We conduct online and offline experiments on one of the world's largest search and recommender systems from Google, and present the results and lessons learned. We demonstrate that the proposed approach can achieve high performance on offline search retrieval tasks, and more importantly, achieved significant improvements on relevance and user interactions over the highly-optimized production system in online experiments.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

doi: 10.1145/3340531.3412752

2008.0293

Country: Asia > Middle East > Lebanon (0.04)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

China and scientists dismiss study suggesting coronavirus spread in August 2019

The Japan TimesJun-10-2020, 09:30:44 GMT

LONDON – Beijing dismissed as "ridiculous" a Harvard Medical School study of hospital traffic and search engine data that suggested the novel coronavirus may already have been spreading in China last August, and scientists said it offered no convincing evidence of when the outbreak began. The research, which has not been peer-reviewed by other scientists, used satellite imagery of hospital parking lots in Wuhan -- where the disease was first identified in late 2019 -- and data for symptom-related queries on search engines for terms such as "cough" and "diarrhea." The study's authors said increased hospital traffic and symptom search data in Wuhan preceded the documented start of the coronavirus pandemic, in December 2019. "While we cannot confirm if the increased volume was directly related to the new virus, our evidence supports other recent work showing that emergence happened before identification at the Huanan Seafood market (in Wuhan)," they said. Paul Digard, an expert in virology at the University of Edinburgh, said that using search engine data and satellite imagery of hospital traffic to detect disease outbreaks "is an interesting idea with some validity."

china and scientist dismiss study, information retrieval, natural language, (13 more...)

The Japan Times

Country:

Asia > China > Hubei Province > Wuhan (0.71)
Asia > China > Beijing > Beijing (0.26)
Asia > Japan (0.07)
Europe > United Kingdom (0.06)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)

Technology:

Information Technology > Information Management > Search (0.80)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.80)

Add feedback

China pushes back against Harvard coronavirus study

Al JazeeraJun-10-2020, 07:35:12 GMT

Beijing has dismissed as "ridiculous" a Harvard Medical School study of hospital traffic and search engine data that suggested the new coronavirus may already have been spreading in China last August, and scientists said it offered no convincing evidence of when the outbreak began. Chinese Foreign Ministry spokeswoman Hua Chunying, asked about the research at a news briefing on Tuesday, said: "I think it is ridiculous, incredibly ridiculous, to come up with this conclusion based on superficial observations such as traffic volume." The research, which has not been peer-reviewed by other scientists, used satellite imagery of hospital parking lots in Wuhan - where the disease was first identified in late 2019 - and data for symptom-related queries on search engines for things such as "cough" and "diarrhoea". The study's authors said increased hospital traffic and symptom search data in Wuhan preceded the documented start of the coronavirus pandemic in December 2019. "While we cannot confirm if the increased volume was directly related to the new virus, our evidence supports other recent work showing that emergence happened before identification at the Huanan Seafood market (in Wuhan)," they said.

harvard coronavirus study, information retrieval, natural language, (16 more...)

Al Jazeera

Country:

Asia > China > Hubei Province > Wuhan (0.72)
Asia > China > Beijing > Beijing (0.26)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Information Management > Search (0.60)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.60)

Add feedback

Google tells 1.1million children that Santa doesn't exist

#artificialintelligenceDec-13-2019, 02:57:54 GMT

Do you remember the moment you found out the truth about Santa? Analysis of Google search data surrounding Santa found that on average, 1,116,500 children ask Google "Is Santa Real" each year. And when exploring the answer provided by the world's leading search engine, Google displays an article with an opening sentence saying "as adults we know Santa Claus isn't real". The article written by online publisher Quartz, aims to give advice to parents regarding what to say when your child asks "Is Santa Real?" but doesn't realise that the opening sentence of their article is the first to be seen by over a million children worldwide, shattering their beliefs instantly. Speaking to experts in Google search results, Stephen Kenwright, Technical Search Engine Optimisation director at Rise at Seven, said that "Google is ranking this article on Quartz as the no.1 result based on the authority of the domain and reliability of the content. "Google's algorithms choose the answer which bests answers the question searched, taking safety into consideration all whilst being factually accurate.

artificial intelligence, information management, natural language, (12 more...)

#artificialintelligence

Genre: Research Report (0.37)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Using Search Queries to Understand Health Information Needs in Africa

Abebe, Rediet, Hill, Shawndra, Vaughan, Jennifer Wortman, Small, Peter M., Schwartz, H. Andrew

arXiv.org Artificial IntelligenceJun-14-2018

The lack of comprehensive, high-quality health data in developing nations creates a roadblock for combating the impacts of disease. One key challenge is understanding the health information needs of people in these nations. Without understanding people's everyday needs, concerns, and misconceptions, health organizations and policymakers lack the ability to effectively target education and programming efforts. In this paper, we propose a bottom-up approach that uses search data from individuals to uncover and gain insight into health information needs in Africa. We analyze Bing searches related to HIV/AIDS, malaria, and tuberculosis from all 54 African nations. For each disease, we automatically derive a set of common search themes or topics, revealing a wide-spread interest in various types of information, including disease symptoms, drugs, concerns about breastfeeding, as well as stigma, beliefs in natural cures, and other topics that may be hard to uncover through traditional surveys. We expose the different patterns that emerge in health information needs by demographic groups (age and sex) and country. We also uncover discrepancies in the quality of content returned by search engines to users by topic. Combined, our results suggest that search data can help illuminate health information needs in Africa and inform discussions on health policy and targeted education efforts both on- and offline.

bioinformatics, knowledge management, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1806.0574

Country:

Africa > Nigeria (0.05)
Africa > Botswana (0.05)
Africa > West Africa (0.04)
(18 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology > HIV (0.99)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Knowledge Management (1.00)
Information Technology > Information Management > Search (1.00)
Information Technology > Communications (1.00)
(3 more...)

Add feedback

Views of AI, robots, and automation based on internet search data

#artificialintelligenceJun-8-2018, 11:56:57 GMT

Artificial intelligence, robots, and automation are rising in importance in many areas. As noted in the recent book, "The Future of Work: Robots, AI, and Automation," there are exciting advances in finance, transportation, national defense, smart cities, and health care, among other areas. Businesses are developing solutions that improve the efficiency and effectiveness of their operations and using these tools to improve the way their firms function. Yet there also are concerns about the impact of these developments on jobs and personal privacy. A Pew Research Center national survey revealed considerable unease about emerging trends.

artificial intelligence, automation, robot, (15 more...)

#artificialintelligence

Country:

North America > United States > California > San Francisco County > San Francisco (0.15)
Asia > China (0.06)
North America > United States > Virginia > Albemarle County > Charlottesville (0.05)
(16 more...)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)
Automobiles & Trucks (1.00)
Government > Regional Government (0.70)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.49)

Add feedback